-
Notifications
You must be signed in to change notification settings - Fork 926
feat: Enable for exporting unmerged HF Lora Adapter #6225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @jason9693, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement by allowing users to export unmerged LoRA adapters from Megatron Core models into a format compatible with HuggingFace's PEFT library. This feature streamlines the process of using LoRA fine-tuned models across different frameworks, providing flexibility and reducing the need for full model retraining. The changes involve adding a dedicated conversion module and integrating it into the existing model export pipeline, specifically targeting GPT models. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to export unmerged LoRA adapters from Megatron Core to HuggingFace PEFT format. The implementation is well-structured, with the core logic encapsulated in a new file. However, I've identified a critical issue in the conversion script that could lead to exported adapter files being overwritten. Additionally, there are a couple of high-severity bugs in the LoRA weight conversion logic that would prevent exporting adapters with biases and would handle other parameters inconsistently. I've provided detailed comments and suggestions to address these issues.
|
Hello! Thanks for your PR. Could you please change the comments to English? 😊 |
Oh, I forgot to change the comments. |
|
@Jintao-Huang I just checked all of my known bugs remained and pushed it with conversion test code. +FYI) I was tested on Qwen3 A3B MoE model. |
|
Hello! I will review it as soon as possible today or tomorrow. Please wait. 😊 |
|
Could you use pre-commit to format the code to pass the lint tests? (use py310/py311) |
|
@Jintao-Huang I just checked pre-commit and commit that changes. |
|
@Jintao-Huang Is there any actions I need to follow? |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a valuable feature to export LoRA adapters in the HuggingFace PEFT format without merging them, which is useful for serving with frameworks like vLLM. The implementation is well-structured, with the core conversion logic neatly encapsulated in mcore2hf_lora.py. The changes to the export script in convert.py are clean and integrate the new functionality well.
However, I've identified a few issues. The LoRA conversion logic appears to miss handling biases, which could lead to incorrect conversions for LoRA layers that use them. Additionally, the new test script test_lora_export.py has some critical flaws: it uses hardcoded paths, making it non-executable, and the weight comparison logic is incorrect as it operates on an already merged model. The test coverage for converted modules is also incomplete. Addressing these points will ensure the feature is robust and verifiable.
| def push(dst, key, tensor): | ||
| """Create independent copy of tensor for saving + ensure contiguous memory""" | ||
| t = tensor.detach().clone().contiguous() | ||
| if key in dst: | ||
| raise ValueError(f'Duplicate key: {key}') | ||
| if 'weight' not in key: | ||
| logger.debug(f'Skipping non-weight key: {key}') | ||
| return | ||
| key = remap_key_for_peft(key) | ||
| dst[key] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The push function currently skips any tensor whose key does not contain 'weight'. This will cause LoRA biases to be dropped during conversion if they are enabled (e.g., with lora_bias='all'). This could lead to incorrect behavior for the exported adapter. The check should be removed to ensure all parts of the LoRA layers, including biases, are correctly processed and saved.
def push(dst, key, tensor):
"""Create independent copy of tensor for saving + ensure contiguous memory"""
t = tensor.detach().clone().contiguous()
if key in dst:
raise ValueError(f'Duplicate key: {key}')
key = remap_key_for_peft(key)
dst[key] = t| base = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=DTYPE, device_map={'': GPU_PEFT}).eval() | ||
|
|
||
| peft_model = PeftModel.from_pretrained(base, ADAPTER_DIR).eval() | ||
| model = peft_model.merge_and_unload().eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load_models function loads a PeftModel and then immediately merges it using peft_model.merge_and_unload(). The resulting merged model is then used in compare_weights. However, compare_weights and its helper peft_effective_weight are designed to work with an unmerged PeftModel to calculate the effective weight by combining the base weight and the LoRA delta (W0 + B@A*scale). When passed a merged model, peft_has_lora will be false, and the test will simply compare the merged weight against itself, making the check ineffective. The test should compare a pre-merged model against an unmerged PeftModel. To fix this, the model returned should be the unmerged PeftModel.
| model = peft_model.merge_and_unload().eval() | |
| model = peft_model |
| def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]: | ||
| out = {} | ||
| for name, mod in model.named_modules(): | ||
| if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes): | ||
| out[name] = mod | ||
| return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The find_linear_modules function is limited to finding LoRA modules within self-attention blocks (e.g., q_proj, o_proj). However, the conversion logic also supports LoRA on MLP layers (gate_proj, up_proj, down_proj) and MoE experts. The weight comparison test should be extended to cover these modules as well to ensure complete verification.
| def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]: | |
| out = {} | |
| for name, mod in model.named_modules(): | |
| if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes): | |
| out[name] = mod | |
| return out | |
| def find_linear_modules(model: nn.Module, suffixes=( | |
| 'q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj' | |
| )) -> Dict[str, nn.Linear]: | |
| out = {} | |
| for name, mod in model.named_modules(): | |
| if isinstance(mod, nn.Linear) and any(name.endswith(suf) for suf in suffixes): | |
| out[name] = mod | |
| return out |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
PR type
PR information
this PR is related to #5204

How to Test